--- title: Where to Find the Best Beer in the US author: Chad Peltier date: '2020-12-07' slug: where-to-find-the-best-beer-in-the-us categories: - R tags: - Geospatial ---
The amazing TidyTuesday project has listed two beer-focused datasets this year, one focused on beer production by state, and the other on Great American Beer Fest (GABF) awards.
From those, I was originally interested in joining the GABF awards data with beer reviews data from BeerAdvocate… but joining the data (off of brewery and beer names) has turned out to be messy and time consuming. So in the meantime, I wanted to share some data just from scraping Beer Advocate reviews.
Scraping was a 3-part process:
Overall, I collected the following data for each beer:
library(tidytuesdayR)
library(rvest)
library(janitor)
library(tidyverse)
library(sf)
library(leaflet)
library(patchwork)
library(tidytext)
library(httr)
Scraping the styles information was straightforward – go to the styles page, find the style html nodes, and pull the linked url from the attributes.
Each style reviews page has a unique identifier number. I then created a dataframe that combined all of the style ID numbers with the page numbers of reviews. For example, page 2 of the American Barleywines style page is “https://www.beeradvocate.com/beer/styles/19/?sort=revsD&start=50” – so I repeated the ending digits in steps of 50 from 0 to 5000 in order to get the 5000 most-reviewed beers per substyle.
It would have been possible to pull either all the beers on BA or to pull it based on average rating, but since I was only interested in beers with more than 10 reviews (and I ended up collecting 4x the number of beers than I ended up using because they had fewer than 10 reviews, anyway) this strategy worked fine.
styles <- read_html(paste0("https://www.beeradvocate.com/beer/styles/"))
style_nums <- styles %>%
html_nodes("a") %>%
html_attr("href") %>%
tibble() %>%
rename("links" = 1) %>%
filter(str_detect(links, "styles/")) %>%
mutate(style_num = parse_number(links)) %>%
drop_na(style_num)
pages <- seq(from = 0, to = 5000, by = 50)
style_nums2 <- rep(style_nums$style_num, times = length(pages)) %>%
enframe() %>%
arrange(value) %>%
pull(value)
pages2 <- rep(pages, times = (length(style_nums2) / length(pages)))
style_pages <- tibble(style_nums2, pages2)
Next came scraping the tables from the individual style pages. It consisted of pulling the table itself and then adding in columns for the link to the individual beer pages as well as the beer substyle. I wrapped all of those pulls into a function that I then purrr::map2’d on the style_pages dataframe I created above (which contained all possible substyle review page urls).
get_reviews_style <- function(style_num, page){
url <- read_html(paste0("https://www.beeradvocate.com/beer/styles/",
style_num,
"/?sort=revD&start=", page) )
stats <- url %>%
html_nodes("table") %>%
html_table(fill = TRUE) %>%
tibble() %>%
unnest(".") %>%
slice(-c(1:2)) %>%
row_to_names(1) %>%
remove_empty() %>%
slice(1:n()-1)
style <- url %>%
html_node("h1") %>%
html_text()
beer_links <- url %>%
html_nodes("a") %>%
html_attr("href") %>%
tibble() %>%
rename("links" = 1) %>%
filter(str_detect(links, "\\/beer\\/profile\\/\\d+\\/\\d+\\/")) %>%
mutate(links = paste0("https://beeradvocate.com", links))
stats <- stats %>%
mutate(beer_style = style) %>%
bind_cols(beer_links)
}
top_beers <- map2(style_pages$style_nums2, style_pages$pages2,
~ get_reviews_style(.x, .y)) %>%
bind_rows()
## Clean and filter to only beers with > 20 ratings, adds broad style and substyle cols
top_beers2 <- top_beers %>%
clean_names() %>%
mutate(broad_style = str_extract(beer_style, ".+(?= - )"),
sub_style = str_extract(beer_style, "(?<= - ).+"),
broad_style = if_else(is.na(broad_style), beer_style, broad_style),
sub_style = if_else(is.na(sub_style), beer_style, sub_style),
ratings = parse_number(ratings),
abv = parse_number(abv),
avg = parse_number(avg),
brewery = str_remove(brewery, "-.+"),
brewery = str_remove(brewery, " & Tasting Room"),
brewery = str_remove(brewery, " & Grill"),
brewery = str_remove(brewery, " & Kitchen"),
brewery = str_remove(brewery, " & Brewpub"),
brewery = str_remove(brewery, " Co\\."),
brewery = str_remove(brewery, " Company"),
brewery = str_trim(brewery),
across(c(abv, ratings,avg), as.numeric)) %>%
filter(ratings > 10) %>%
mutate(group_num = rep(1:10, each = 4692, length.out = nrow(.)))
Finally, there was a little more data from the beers’ individual review pages that I also pulled. This included the brewery’s state or country and the BA score (as opposed to BA users’ average ratings).
There is also code below to scrape the most recent 25 user reviews text, but I didn’t end up running that code section for the analysis below, since pulling all of the data took > 6 hours as it was. I also had to break up that purrr::map() function into 5 separate pulls (although it’s not reflected in the code chunk below) because I kept getting timeout errors from rvest connecting with the BA site.
get_more_beer_info <- function(url){
## Brewery state, BA score
state_names <- state.name
country_names <- str_trim(ISOcodes::UN_M.49_Countries$Name)
info <- read_html(url) %>%
html_nodes("div") %>%
html_text() %>%
enframe() %>%
mutate(value = str_remove_all(value, "\n"),
value = str_replace(value, "Avail", " Avail")) %>%
filter(str_detect(value, "^Beer Geek Stats:")) %>%
mutate(state = str_extract(value, paste(state_names, collapse = "|")),
country = str_extract(value, paste(country_names, collapse = "|")),
score = str_extract(value, "(?<=Score:)\\d+"),
links = url)
## Most recent 25 user reviews
reviews <- read_html(url) %>%
html_nodes("#rating_fullview_content_2") %>%
html_text() %>%
enframe() %>%
mutate(value = str_remove_all(value, ".+(?=overall:)"),
value = str_remove_all(value, "\n")) %>%
summarize(reviews = paste(value, collapse = "|")) %>%
mutate(links = url)
info %>%
left_join(reviews, by = "links")
}
## map across all beers in top_beers2 df
more_beer_info <- map_df(top_beers2$links, get_more_beer_info)
## join with top_beers2 df
top_beers3 <- top_beers2 %>%
left_join(more_beer_info, by = "links") %>%
mutate(score = as.numeric(score))
Here are basic counts for how many beers we have in the final data by both broad style and substyle. LOTS of IPAs, although that shoudl surprise no one :) The top four substyles are all IPAs or American pales.
top_beers3 %>%
count(broad_style, sort = TRUE) %>%
head(20)
## # A tibble: 20 x 2
## broad_style n
## <chr> <int>
## 1 IPA 12624
## 2 Stout 5602
## 3 Lager 4080
## 4 Pale Ale 3618
## 5 Wheat Beer 2230
## 6 Farmhouse Ale 2023
## 7 Porter 2011
## 8 Wild Ale 1588
## 9 Red Ale 1486
## 10 Pilsner 1324
## 11 Sour 1301
## 12 Strong Ale 1100
## 13 Brown Ale 1097
## 14 Fruit and Field Beer 961
## 15 Blonde Ale 869
## 16 Bock 661
## 17 Barleywine 603
## 18 Bitter 559
## 19 Tripel 506
## 20 Kölsch 434
top_beers3 %>%
count(beer_style, sort = TRUE) %>%
head(20)
## # A tibble: 20 x 2
## beer_style n
## <chr> <int>
## 1 IPA - American 5050
## 2 IPA - Imperial 3912
## 3 Pale Ale - American 2758
## 4 IPA - New England 2456
## 5 Stout - American Imperial 2076
## 6 Farmhouse Ale - Saison 1867
## 7 Wild Ale 1588
## 8 Red Ale - American Amber / Red 1139
## 9 Porter - American 1064
## 10 Fruit and Field Beer 961
## 11 Stout - American 894
## 12 Stout - Sweet / Milk 808
## 13 Stout - Russian Imperial 807
## 14 Pilsner - German 775
## 15 Blonde Ale - American 754
## 16 Wheat Beer - Hefeweizen 705
## 17 Wheat Beer - Witbier 688
## 18 Brown Ale - American 662
## 19 Sour - Berliner Weisse 657
## 20 Wheat Beer - American Pale 602
First, let’s take a look at the raw number and percent of each style that is either rated above a 4 by BA users or above a 90 in the BA score. Above a 90 indicates that a beer is either in the “Outstanding” or “World Class” categories.
## num/percent of beers > 4 rating
top_beers3 %>%
group_by(broad_style) %>%
summarize(n = n(),
n_over4 = sum(avg >= 4, na.rm = TRUE),
percent_over4 = round((n_over4 / n),2)) %>%
filter(!is.na(broad_style)) %>%
ggplot(aes(percent_over4, reorder(broad_style, percent_over4), fill = broad_style)) +
geom_col() +
theme_classic() +
theme(legend.position = "none",
text = element_text(size=9)) +
labs(y = "Beer Style", x = "Percent of Beers with an Average Rating >= 4",
title = "Percent of Beers with an Average Rating >= 4") +
scale_x_continuous(labels = scales::percent_format())

top_beers3 %>%
group_by(broad_style) %>%
summarize(n = n(),
n_over4 = sum(avg >= 4, na.rm = TRUE),
percent_over4 = round((n_over4 / n),2)) %>%
filter(!is.na(broad_style)) %>%
ggplot(aes(n_over4, reorder(broad_style, n_over4), fill = broad_style)) +
geom_col() +
theme_classic() +
theme(legend.position = "none",
text = element_text(size=9)) +
labs(y = "Beer Style", x = "Number of Beers with an Average Rating >= 4",
title = "Number of Beers with an Average Rating >= 4")

top_beers3 %>%
group_by(broad_style) %>%
summarize(n = n(),
n_over90 = sum(score >= 90, na.rm = TRUE),
percent_over90 = round((n_over90 / n),2)) %>%
filter(!is.na(broad_style)) %>%
ggplot(aes(percent_over90, reorder(broad_style, percent_over90), fill = broad_style)) +
geom_col() +
theme_classic() +
theme(legend.position = "none",
text = element_text(size=9)) +
labs(y = "Beer Style", x = "Percent of Beers with a BA Score >= 90",
title = "Percent of Beers by General Style with a BA Score >= 90") +
scale_x_continuous(labels = scales::percent_format())

Wild ales and lambics do well just about however you slice the data. Obviously the list of lambic brewers and blenders in the world is small, but it’s still remarkable to have over 60% of a style rated as either outstanding or world class.
The fact that the single highest number of amazing (4+ by reader ratings) IPAs speaks to the insane number of IPAs produced right now. And it’s no surprise that stouts and wild ales are the second and (distant) third-place styles – those are usually the kinds of beers we think of as hype whales.
Also of note: based on reader averages, high-ABV styles like quads, (many) stouts, barleywines, and (some) IPAs are at or near the top of the list of the most 4+ rated beers. Again, these are whale-ish beers, like Dark Lord, which costs $170 for just five bottles, Hunahpu or Pliny the Younger.
What about the distribution of reviews within a style? Using boxplots and the interquartile range, we can take a look at the most and least-divisive beer styles. The idea here is that beer ratings for some styles are more clustered together than others, while some styles have both very high and very low-rated beers.
## distribution of reviews
top_beers3 %>%
filter(!is.na(broad_style)) %>%
ggplot(aes(avg, reorder(broad_style, avg), color = broad_style)) +
#geom_jitter(aes(alpha = 0.3)) +
geom_boxplot() +
theme_classic() +
theme(legend.position = "none",
text = element_text(size=9)) +
labs(y = "Beer Style", x = "Avg Rating")

## Most divisive styles (within-style IQR)
top_beers3 %>%
group_by(broad_style) %>%
summarize(iqr = IQR(avg)) %>%
filter(!is.na(broad_style)) %>%
arrange(desc(iqr)) %>%
slice_max(order_by = iqr, n = 20) %>%
ggplot(aes(iqr, reorder(broad_style, iqr), color = broad_style)) +
geom_point() +
geom_segment(aes(x = 0, xend = iqr, y = broad_style, yend = broad_style)) +
theme_classic() +
theme(panel.grid.major.y = element_blank(),
panel.border = element_blank(),
axis.ticks.y = element_blank(),
legend.position = "none",
text = element_text(size=9)) +
labs(x = "Interquartile range", y = "Beer Style",
title = "Most Divisive Beer Styles")

## Most divisive substyles (within-style IQR)
top_beers3 %>%
group_by(beer_style) %>%
summarize(iqr = IQR(avg)) %>%
filter(!is.na(beer_style)) %>%
arrange(desc(iqr)) %>%
slice_max(order_by = iqr, n = 20) %>%
ggplot(aes(iqr, reorder(beer_style, iqr), color = beer_style)) +
geom_point() +
geom_segment(aes(x = 0, xend = iqr, y = beer_style, yend = beer_style)) +
theme_classic() +
theme(panel.grid.major.y = element_blank(),
panel.border = element_blank(),
axis.ticks.y = element_blank(),
text = element_text(size=9),
legend.position = "none") +
labs(x = "Interquartile range", y = "Beer substyle",
title = "Most Divisive Beer Substyles")

# Least divisive substyles (within-style IQR)
top_beers3 %>%
group_by(beer_style) %>%
summarize(iqr = IQR(avg)) %>%
filter(!is.na(beer_style)) %>%
arrange(desc(iqr)) %>%
slice_min(order_by = iqr, n = 20) %>%
ggplot(aes(iqr, reorder(beer_style, -iqr), color = beer_style)) +
geom_point() +
geom_segment(aes(x = 0, xend = iqr, y = beer_style, yend = beer_style)) +
theme_classic() +
theme(panel.grid.major.y = element_blank(),
panel.border = element_blank(),
axis.ticks.y = element_blank(),
text = element_text(size=9),
legend.position = "none") +
labs(x = "Interquartile range", y = "Beer substyle",
title = "Least Divisive Beer Substyles")

Some thoughts here:
As noted above, high-ABV beer styles are often highly-rated, but what about higher-ABV beers within a style?
## ABV vs. ratings (broad style and beer style)
top_styles <- top_beers3 %>%
count(broad_style, sort = TRUE) %>%
top_n(9)
top_beers3 %>%
filter(broad_style %in% top_styles$broad_style,
abv < 20) %>%
ggplot(aes(abv, avg)) +
geom_point(aes(alpha = 0.4, color = broad_style)) +
geom_smooth() +
facet_wrap(~broad_style, scales = "free_x") +
theme_classic() +
theme(panel.grid.major.y = element_blank(),
panel.border = element_blank(),
axis.ticks.y = element_blank(),
legend.position = "none") +
labs(y = "Avg Rating", x = "ABV", title = "ABV vs. Avg User Rating by General Style")

top_styles <- top_beers3 %>%
count(beer_style, sort = TRUE) %>%
top_n(9)
top_beers3 %>%
filter(abv < 20,
beer_style %in% top_styles$beer_style) %>%
ggplot(aes(abv, avg)) +
geom_point(aes(alpha = 0.4, color = beer_style)) +
geom_smooth() +
facet_wrap(~beer_style, scales = "free_x") +
theme_classic() +
theme(panel.grid.major.y = element_blank(),
panel.border = element_blank(),
axis.ticks.y = element_blank(),
legend.position = "none") +
labs(y = "Avg Rating", x = "ABV", title = "ABV vs. Avg User Rating by Substyle")

In general, for most of the top-9 styles, there’s not a strong relationship between ABV and average BA user rating. However, for stouts, wheat beers, and porters, there is a small positive correlation. Interestingly, for substyles, there’s not as strong of a relationship between ABV and rating for double IPAs as for imperial stouts.
We can also look at the top breweries by style. Here I use mean rating by style for brewers that have at least two beers with 10+ reviews on Beer Advocate.
## top breweries by style (brewery has to have 3 or more examples)
library(tidytext)
top_styles <- top_beers3 %>%
count(broad_style, sort = TRUE) %>%
top_n(9)
top_beers3 %>%
group_by(broad_style, brewery) %>%
summarize(n = n(),
avg_rating = round(mean(avg, na.rm = TRUE), 2)) %>%
filter(n > 2,
!is.na(broad_style),
broad_style %in% top_styles$broad_style) %>%
group_by(broad_style) %>%
slice_max(n = 10, order_by = avg_rating) %>%
ggplot(aes(avg_rating, reorder_within(brewery, avg_rating, broad_style),
color = broad_style)) +
geom_point() +
geom_segment(aes(x = 0, xend = avg_rating,
y = reorder_within(brewery, avg_rating, broad_style),
yend = reorder_within(brewery, avg_rating, broad_style))) +
facet_wrap(~broad_style, scales = "free_y") +
scale_y_reordered() +
theme_classic() +
theme(panel.grid.major.y = element_blank(),
panel.border = element_blank(),
axis.ticks.y = element_blank(),
text = element_text(size=9),
legend.position = "none") +
labs(x = "Average Rating", y = "Brewery", title = "Top Breweries by Style")

top_styles <- top_beers3 %>%
count(beer_style, sort = TRUE) %>%
top_n(9)
top_beers3 %>%
group_by(beer_style, brewery) %>%
summarize(n = n(),
avg_rating = round(mean(avg, na.rm = TRUE), 2)) %>%
filter(n > 2,
!is.na(beer_style),
beer_style %in% top_styles$beer_style) %>%
group_by(beer_style) %>%
slice_max(n = 10, order_by = avg_rating) %>%
ggplot(aes(avg_rating, reorder_within(brewery, avg_rating, beer_style),
color = beer_style)) +
geom_point() +
geom_segment(aes(x = 0, xend = avg_rating,
y = reorder_within(brewery, avg_rating, beer_style),
yend = reorder_within(brewery, avg_rating, beer_style))) +
facet_wrap(~beer_style, scales = "free_y") +
scale_y_reordered() +
theme_classic() +
theme(panel.grid.major.y = element_blank(),
panel.border = element_blank(),
axis.ticks.y = element_blank(),
text = element_text(size=7),
legend.position = "none") +
labs(x = "Average Rating", y = "Brewery", title = "Top Breweries by Substyle")

To me, one of the signs of a truly amazing brewer is that you can buy any beer they make, no matter the style, and have confidence that it’ll be a great beer. Creature Comforts, which is (well, was, when I lived in Athens) local to me, is one of those breweries. Sierra Nevada is another.
So, the following charts look at breweries that have top-20-rated beers in multiple styles. This is just one way to look at best breweries, but I think the results track pretty closely with who you’d expect.
## (number of styles that a brewery has a top 20 avg rating in)
top_beers3 %>%
group_by(broad_style, brewery) %>%
summarize(n = n(),
avg_rating = round(mean(avg, na.rm = TRUE), 2)) %>%
filter(n > 2) %>%
group_by(broad_style) %>%
slice_max(order_by = avg_rating, n = 20) %>%
ungroup() %>%
count(brewery, sort = TRUE) %>%
slice_max(order_by = n, n = 20) %>%
ggplot(aes(n, reorder(brewery,n), fill = brewery)) +
geom_col() +
theme_classic() +
theme(panel.grid.major.y = element_blank(),
panel.border = element_blank(),
axis.ticks.y = element_blank(),
text = element_text(size=10),
legend.position = "none") +
labs(x = "Number of Styles w/ Top 20 Avg Rating", y = "Brewery",
title = "Breweries That Are Awesome at Multiple Styles (Avg Rating")

## By score > Rating
top_beers3 %>%
group_by(broad_style, brewery) %>%
summarize(n = n(),
avg_rating = round(mean(score, na.rm = TRUE), 2)) %>%
filter(n > 2) %>%
group_by(broad_style) %>%
slice_max(order_by = avg_rating, n = 20) %>%
ungroup() %>%
count(brewery, sort = TRUE) %>%
slice_max(order_by = n, n = 20) %>%
ggplot(aes(n, reorder(brewery,n), fill = brewery)) +
geom_col() +
theme_classic() +
theme(panel.grid.major.y = element_blank(),
panel.border = element_blank(),
axis.ticks.y = element_blank(),
text = element_text(size=10),
legend.position = "none") +
labs(x = "Number of Styles w/ Top 20 Avg Score", y = "Brewery",
title = "Breweries That Are Awesome at Multiple Styles (Avg Score")

top_beers3 %>%
group_by(beer_style, brewery) %>%
summarize(n = n(),
avg_rating = round(mean(avg, na.rm = TRUE), 2)) %>%
filter(n > 2) %>%
group_by(beer_style) %>%
slice_max(order_by = avg_rating, n = 10) %>%
ungroup() %>%
count(brewery, sort = TRUE) %>%
slice_max(order_by = n, n = 20) %>%
ggplot(aes(n, reorder(brewery,n), fill = brewery)) +
geom_col() +
theme_classic() +
theme(panel.grid.major.y = element_blank(),
panel.border = element_blank(),
axis.ticks.y = element_blank(),
text = element_text(size=10),
legend.position = "none") +
labs(x = "Number of Beer Substyles w/ Top 20 Avg Rating", y = "Brewery",
title = "Breweries That Are Awesome at Multiple Substyles")

Trillium, Jackie O’s, Hill Farmstead, Tree House, Suarez, Sante Adairius, Holy Mountain, Side Project… these are all who you’d probably shortlist for the best breweries in the country.
## relationship bet score and avg rating
top_styles <- top_beers3 %>%
count(broad_style, sort = TRUE) %>%
top_n(9)
top_beers3 %>%
filter(broad_style %in% top_styles$broad_style) %>%
ggplot(aes(x = score, y = avg, color = broad_style, alpha = 0.4)) +
geom_point() +
theme_classic() +
theme(panel.grid.major.y = element_blank(),
panel.border = element_blank(),
axis.ticks.y = element_blank(),
legend.position = "none") +
facet_wrap(~ broad_style, scales = "free") +
labs(x = "Beer Advocate Score", y = "Avg User Rating",
title = "Comparison Between Beer Advocate Scores and User Ratings")

Beervana’s Jeff Alworth often likes to talk about beer being part of, and a reflection of, culture – and as a result, places develop local and regional specialties.
Even with large regional craft breweries like New Belgium or Boston Beer, states still develop local specialties – like IPAs and Pilsners in Oregon and NE IPAs in Vermont and Massachusetts.
So let’s make some maps. First we’ll make a basemap of the lower 48 states using the tigris package.
library(tigris)
library(sf)
library(leaflet)
us <- states(cb = TRUE, progress_bar = FALSE)
drop_states <- c("Commonwealth of the Northern Mariana Islands",
"United States Virgin Islands",
"Alaska", "Hawaii", "American Samoa", "Guam", "Puerto Rico")
us2 <- us %>%
filter(!NAME %in% drop_states) %>%
st_set_crs("4326")
Then we can find the top states per style – we’ll look at states with the most beers in a style (for the top 9 styles) either rated over 4 by users or over 90 by the BA score.
## top states per style
top_styles <- top_beers3 %>%
count(broad_style, sort = TRUE) %>%
top_n(9)
beer_state_styles <- top_beers3 %>%
group_by(broad_style, state) %>%
summarize(n_beers = n(),
n_over4 = sum(avg >= 4, na.rm = TRUE),
n_over90 = sum(score >= 90, na.rm = TRUE),
avg_rating = round(mean(avg, na.rm = TRUE), 2)) %>%
filter(broad_style %in% top_styles$broad_style) %>%
right_join(us2, by = c("state" = "NAME")) %>%
st_sf(sf_column_name = "geometry", crs = 4326)
maps <- map(top_styles$broad_style, ~ ggplot(beer_state_styles) +
geom_sf() +
geom_sf(data = beer_state_styles %>% filter(broad_style == .x),
aes(fill = n_over4)) +
scale_fill_viridis_b(direction = -1) +
theme_void() +
ggtitle(.x))
wrap_plots(maps)
maps2 <- map(top_styles$broad_style, ~ ggplot(beer_state_styles) +
geom_sf() +
geom_sf(data = beer_state_styles %>% filter(broad_style == .x),
aes(fill = n_over90)) +
scale_fill_viridis_b(direction = -1) +
theme_void() +
ggtitle(.x))
wrap_plots(maps2)
States With the Most Beers with an Avg Rating > 4
States With the Most Beers with a BA Score > 90
It’s not a surprise that California is an IPA hotbed… but what about New York? And the lack of Oregon or Washington? For Oregon, I’m guessing it has to be because there are a bunch of highly-rated IPAs, but they don’t have more than 10 reviews, particularly because breweries are incentivized to constantly churn out new one-off beers (in 4-packs of 16oz cans with sticker labels, of course). Or – maybe Beer Advocate just isn’t as big of a thing in Oregon compared to other places like New York. shruggie emoji
Oregon’s awesome lager and wild/farmhouse ale breweries (get you a brewery – pFriem – that can do both IMO..) do come through in some of the other maps, though.
Also, note Texas coming through in farmhouse ales – that’s almost 100% just Jester King!
Another funny note – Florida is particularly strong in porters and stouts – not beers you’d necessarily associate with warm weather and the beach, but breweries like Angry Chair, J. Wakefield Brewing, Cigar City, and Funky Buddha all have amazing dark beers.
We can also make an interactive map for the whole country for a single style. Where would you go to find the best pilsners?
Note that the map has hover-over tooltips that show the number of beers with scores > 90 and ratings > 4, as well as average ratings and the top beer in that style in the state.
## leaflet map - Pilsners
make_leaflet <- function(style) {
style_scores <- top_beers3 %>%
filter(broad_style == style) %>%
group_by(state) %>%
summarize(n_beers = n(),
n_over4 = sum(avg >= 4, na.rm = TRUE),
n_over90 = sum(score >= 90, na.rm = TRUE),
avg_rating = round(mean(avg, na.rm = TRUE), 2)) %>%
left_join(top_beers3 %>%
filter(broad_style == style) %>%
group_by(state) %>%
slice_max(order_by = avg, n = 1) %>%
mutate(name = paste(brewery, name, sep = ": ")) %>%
select(top_beer = name)) %>%
right_join(us2, by = c("state" = "NAME")) %>%
st_sf(sf_column_name = "geometry", crs = 4326)
pal <- colorBin("viridis", style_scores$n_over90, pretty = TRUE, reverse = TRUE)
leaflet(style_scores) %>%
addProviderTiles(providers$Stamen.TonerLite,
options = providerTileOptions(minZoom = 2, maxZoom = 5)) %>%
addPolygons(fillColor = ~ pal(n_over90),
weight = 0.5, opacity = 1,
color = "black",
fillOpacity = 0.5, smoothFactor = 0.5,
label = paste0("N reviews > 4 = ", style_scores$n_over4, ", ",
"N scores > 90 = ", style_scores$n_over90, ", ",
"Avg Rating = ", style_scores$avg_rating, ", ",
"Top beer: ", style_scores$top_beer)) %>%
setView(-98.5795, 39.8282, zoom=3)
}
make_leaflet("Pilsner")
Or the best state for an IPA?
make_leaflet("IPA")
There’s a lot you can wring out of this data that I couldn’t get to here. In future posts I’d like to also extract the actual beer review texts and see whether it’s possible to build a model to predict a beer’s average rating based on the text and style, or whether we could predict the beer style based on the text. There are also countless more maps we could make… but this is a good start.